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BACKWARD ESTIMATION OF STOCHASTIC PROCESSES WITH 
FAILURE EVENTS AS TIME ORIGINS 1 

By Kwun Chuen Gary Chan and Mei-Cheng Wang 

University of Washington and Johns Hopkins University 

Stochastic processes often exhibit sudden systematic changes in 
pattern a short time before certain failure events. Examples include 
increase in medical costs before death and decrease in CD4 counts 
before AIDS diagnosis. To study such terminal behavior of stochastic 
processes, a natural and direct way is to align the processes using 
failure events as time origins. This paper studies backward stochastic 
processes counting time backward from failure events, and proposes 
one-sample nonparametric estimation of the mean of backward pro- 
cesses when follow-up is subject to left truncation and right censoring. 
We will discuss benefits of including prevalent cohort data to enlarge 
the identifiable region and large sample properties of the proposed 
estimator with related extensions. A SEER-Medicare linked data set 
is used to illustrate the proposed methodologies. 

1. Introduction. Stochastic processes such as recurrent events and re- 
peated measurements are often collected in medical follow-up studies in ad- 
dition to survival data. Examples include recurrent hospitalizations, medical 
cost processes, repeated quality of life measurements and CD4 counts. Such 
processes often exhibit certain terminal behavior during a short time before 
failure events. For example, medical costs tend to increase suddenly before 
death, qualities of lives deteriorate before death and CD4 counts decrease 
before AIDS diagnosis. 

Conventional statistical methodologies mainly focus on stochastic pro- 
cesses that are counting forward from initial events observed for every indi- 
vidual; see Nelson (1988), Pepe and Cai (1993), Lawless and Nadeau (1995), 
Cook and Lawless (1997), Lin et al. (2000) and Wang, Qin and Chiang 
(2001), among others, on recurrent event processes, Lin (2000) on medi- 
cal cost processes and Pawitan and Self (1993) on CD4 count processes. 



Received September 2009. 

1 Supported in part by the National Institutes of Health Grant P01 CA 098252. 
Key words and phrases. Marked process, left truncation, prevalent cohort, recurrent 
event process, recurrent marker process, survival analysis. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Applied Statistics. 

2010, Vol. 4, No. 3, 1602-1620. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 



K. C. G. CHAN AND M.-C. WANG 



The conventional views of stochastic processes, however, are not designed 
to study the terminal behavior of processes. Consider medical cost as an 
example. Calculating the mean of cost processes for a population defined at 
an initial event would include both survivors and nonsurvivors at any fixed 
time after the initial event, and the increase in medical cost based on sur- 
vivors' cost measurement is offset by nonsurvivors who do not contribute to 
the increase in medical cost after death. Unless the failure times are constant 
over a population, conventional forward processes do not serve the purpose 
of estimating the terminal behavior of stochastic processes. 

In this paper we directly consider stochastic processes before failure events 
of interest, by introducing backward processes that start at failure events 
and counting backward in time. By aligning the origins of the processes 
to failure events, terminal behavior of stochastic processes could be natu- 
rally and directly studied by the backward processes. We will focus on one- 
sample nonparametric estimation of the mean of backward processes when 
the failure events are partially observed subject to left truncation and right 
censoring. Since failure events and processes right before failure events may 
not be observed, statistical methods are needed to correct a bias induced 
by missingness. Development of methods rely on a stochastic representation 
technique of a marked counting process generalizing that of Huang and Louis 
(1998) and the proposed estimator also generalizes a weighted estimator for 
left truncated and right censored data proposed by Gross and Lai (1996). 

Throughout this paper we will consider medical costs as motivating ex- 
amples. The SEER-Medicare linked data provide illustrative examples of 
medical cost process data collected in a left truncated and right censored 
follow-up sample. The Surveillance, Epidemiology and End-Results (SEER)- 
Medicare linked data are population-based data for studying cancer epidemi- 
ology and quality of cancer-related health services. The SEER-Medicare 
linked data consist of a linkage of two large population-based databases, 
SEER and Medicare. The SEER data contain information of cancer inci- 
dence diagnosed between 1973 and 2002. The Medicare data contain infor- 
mation on medical costs between 1986 and 2004. The linked data consist 
of cancer patients in the SEER data who were enrolled in Medicare during 
the study period of the Medicare data. Details of each data and linkage 
are discussed in Warren et al. (2002). Although the linkage criterion sounds 
simple, it creates a left truncated and right censored sample because the two 
data sets have different starting times. In the SEER-Medicare linked data, 
patients diagnosed with cancer before 1986 form a prevalent cohort, because 
only those patients who survived through 1986 were included. Patients diag- 
nosed with cancer after 1986 form an incident cohort, because those patients 
were recruited at the onset of disease. Patients survived through 2004 were 
considered censored. A prevalent cohort is typically a left truncated and 
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right censored sample and data from a combination of incident and preva- 
lent cohorts are also left truncated and right censored. 

This article is organized as follows. In Section 2 we will introduce back- 
ward processes to study terminal behavior of stochastic processes and dis- 
cuss the differences from conventional forward process models. The proposed 
methods for estimating the mean function of backward processes will be dis- 
cussed in Section 3, together with identifiability problems associated with 
incomplete follow-up, large sample properties of the proposed estimators 
and a method for constructing confidence bands for mean functions. We will 
also discuss two related extensions of the proposed procedure in Section 4, 
one is on distributional estimation of the backward processes, and the other 
is on estimation of derivatives of backward mean functions. Simulations and 
real examples analyzing a SEER-Medicare linked data set will be presented 
in Section 5. Section 6 will include several concluding remarks. 

2. Forward and backward processes. Let Y(t) be a stochastic process 
with bounded variation, where t is the time after an initial event, usually 
defined as the time of disease onset. We call Y(t) = J Q dY(s) a forward 
stochastic process since the time index t in Y(t) starts at the initial event 
and moves forward with calendar time. On the other hand, a backward 
stochastic process is defined as V(u) = J^_ u dY(s), where T is the time 
from the initial event to a failure event of interest and the time origin for 
V(u) is the failure event. In the medical cost example, Y(t) is total medical 
cost within t time units after the initial event, and V(u) is total medical 
cost during the last u time units of life. Figure 1 shows the trajectories of 
forward and backward cost processes for 3 uncensored individuals in the 
SEER-Medicare linked data. 

In Figure 1 we can see an increase in medical cost a short period be- 
fore death. To study this terminal behavior of medical cost processes, it is 
natural to align the processes to a different time origin, the failure event, 
as shown in Figure 1(b). Since terminal behavior of stochastic processes 
usually incur during a short time period before death, relevant scientific 
questions center on a rather short period tq before death, say, 6 months 
or 1 year. To is a prespecified time period related to scientific questions of 
interest. The backward stochastic processes at To time units before failure 
events are only meaningfully defined for a subgroup of patients who survive 
at least To time units, and the estimand of interest is E(V(u)\T > to), for 
u € [0, To]. However, due to limited study duration, only a conditional ver- 
sion fj, T0)T1 (u) = E(V(u)\to < T < n) can be nonparametrically identified, 
where t\ depends on study design and data availability. t\ can be taken as 
the maximum follow-up period, and the time period of interest to is usually 
much shorter than t\. We will further discuss implications of incident and 
prevalent sampling on the identifiability constraints in Section 3.2. 
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To distinguish between processes with time origins at initial events and 
failure events, throughout this paper t denotes a time index counting for- 
ward from initial events and u denotes a time index counting backward from 
failure events. The processes Y(t) and V(u) address different scientific ques- 
tions and have different interpretations. Consider the medical cost example, 
where Y(t) measures medical cost from an initial event. Note that Y(t) will 
not increase after death, so that Y(t) = Y{T) for t>T. The interpretation 
of forward mean function E(Y(t)) is generally confounded with survival per- 
formance. For example, if there are two groups of patients with the same 
spending per unit time when alive but different survival distributions, the 
group with longer survival time will have a higher mean forward cumulative 
cost. There may also be crossovers between mean forward cost curves, be- 
cause patients with severe disease tend to spend more near disease onset but 
die in shorter time than patients with less severe disease. We shall see such 
an example from the SEER-Medicare data set in Section 5.2. On the other 
hand, the time origin of a backward process V{u) is defined to be a failure 
event, and the backward mean function can be interpreted as the mean of 
stochastic processes before failure events. In the medical cost example, when 
financial decision is a major concern (e.g., decision made by insurance com- 
pany), then discounted forward cost may be more relevant. The backward 




Fig. 1. Trajectories of forward and backward cost processes for 3 uncensored individuals 
in the SEER-Medicare linked data, (a) Forward cost processes. Circles represent failure 
events, (b) Backward cost processes. Circles represent diagnoses of cancer. 



BACKWARD ESTIMATION OF STOCHASTIC PROCESSES 



5 



processes essentially answer different types of questions related to end-of-life 
cost, and there is currently a lot of public health interest in comparing and 
evaluating palliative care. This work could provide valid statistical meth- 
ods for public health researchers interested in estimating end-of-life medical 
cost, together with other applications. 

3. Proposed estimation. 

3.1. Data structure. Let T be a failure time, C be a censoring time and 
W be a truncation time. (T,C,W) are defined relative to an initial event. 
Truncation time W is the time between the initial event and the time of 
recruitment. For incident cases, W = 0. For prevalent cases, W > and sur- 
vival data are observed only when T > W, that is, the failure time is left 
truncated. Also, since censoring is only meaningfully defined for subjects 
who are eligible to be sampled, we assume that P(W < C) = 1 as discussed 
in Wang (1991). Let X = min(T,C) and A = I(T < C). In addition to ob- 
serving the usual left truncated and right censored survival data (W,X,A), 
Y(t) is also observed from time of recruitment to death or censoring. We as- 
sume an independent censoring and truncation condition in which {V(-),T} 
is independent of {W, C}. This assumption does not impose any dependent 
structure between the process V(-) and the failure time T. In fact, V(-) and 
T are allowed to be arbitrarily dependent under this assumption and thus 
handle the case of informative failure events. The assumption is similar in 
nature to those imposed for nonparametric estimation of forward mean func- 
tion with informative terminal events; see, for example, Lawless and Nadeau 
(1995), Lin et al. (1997), Strawderman (2000) and Ghosh and Lin (2000). 
Let S(t) = P(T > t) and G(t) = P(X > t > W\T > W), by the independent 
censoring and truncation conditions G(t) = S(t) ■ P(C >t> W)/(3 where 
p = P(T > W). 

To estimate the mean of V{u) for u € [0,To], we only need the following 
minimal data [Huang and Louis (1998)]: 

{W i ,X i ,A i ,{A i V i (u),ue[Q,T }},i = l,...,n}. 

That is, in addition to the survival data, we only need backward process 
data to be available for individuals whose failure events are uncensored. 
For subjects in a prevalent cohort, backward process data may not be fully 
available for individuals who experience failure events within tq from re- 
cruitment. In this case, we may treat recruitment time to be tq after the 
actual recruitment date and W + tq be a new truncation variable for the 
subjects in a prevalent cohort. This is equivalent to artificially truncating a 
small portion of data and it guarantees that V(u),u S [0,to], is observable 
for all uncensored observations with T >tq in the prevalent cohort. 
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3.2. Identifiability. A backward stochastic process can be viewed as a 
marked process attached to a failure event. This is a generalization of marked 
variables considered by Huang and Louis (1998) in which random variables 
are observed at failure events. Because of limited study duration, marginal 
distribution of marked variables cannot be fully identified nonparametri- 
cally. The same applies to backward stochastic processes because we do not 
have data on backward processes for subjects with survival time greater 
than 7~i, which is the maximum support of the censoring time. In view of 
this identifiability problem, together with the fact that stochastic processes 
within a prespecified time period of interest tq before failure events are only 
meaningfully defined for the subgroup of individuals who survives at least 
To time units, we confine ourselves to estimate a conditional version of back- 
ward mean function, fJ- T0 ,Ti(u) — E(V(u)\tq < T < T\), for u £ [0,to]. If the 
maximum support of T is at most n, then E(V(u)\T > tq) can be estimated 
for u G [0,To]. 

In an incident cohort, t\ is usually the maximum follow-up duration, 
which is determined by study design. In a prevalent cohort, t\ is the longest 
observation time, which is usually longer than the maximum follow-up time 
because subjects have already experienced the initial events before recruit- 
ment. An important implication of using prevalent cohort data is that it 
allows us to identify a larger portion, possibly all, of the right tail of the 
survival distribution. For example, Figure 2 shows the estimated survival 
probabilities for ovarian cancer patients in three different historic stages at 
diagnosis. For all three groups of patients, the full right tail of survival dis- 
tributions can be identified when the full data set is considered, but not in 
the case when we only analyze the incident cohort. If we only analyze the 
incident cohort data, t\ is 18 years, which is the maximum follow-up period 
for the incident cohort in the data set. When we include prevalent cohort 
data in the analysis, we can estimate E(V(u)\T > tq) nonparametrically. 

3.3. Proposed estimator. We propose an estimator for the backward mean 
function ^ TOjT1 (ii) by using marked counting process arguments extending 
those of Huang and Louis (1998). Let Ni(t) = I(Xi < t, Aj = 1), i = 1, . . . , n, 
be counting processes for observed failure, Ri(t) = I(Xi >t> Wi) be at-risk 
indicators, and 



be marked counting processes for observed failure with a random marker 



Vi(u). Define averaged processes N(t) = n 1 x J27=i N (t, u) = n 1 x 

YJi=i N Y i 1 ^) and R(t)=n~ 1 x ElLi^iW- Furthermore, let A T (s) be the 




if t > t 
if t < T 



V i (u)I{To<X i <t,A i = l) 
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Time from onset (Years) 



Fig. 2. Estimates of survival probabilities for ovarian cancer patients in the SEER-Medi- 
care data, using only incident cohort data (bold) and using data from both incident and 
prevalent cohorts (nonbold). Solid curves represent localized stage at diagnosis, dashed 
curves represent regional stage and dotted curves represent distant stage. 



cumulative hazard function for T and A^(i,u) = J"^ E(V(u)\T = s)Ay(ds). 

A}f (t,u) can be interpreted as a hazard weighted cumulative mean of back- 
ward processes, which is called cumulative mark-specific hazard function in 
Huang and Louis (1998). 
Note that 

E(V(u)I(r <T< n ))= P S(s)Al(ds,u). 

J T 

If we have an estimate of (t , u) , denoted by A^ (t , u) , then we can estimate 

E(V(u)I{t <T< n)) by Q S(s)kY (ds,u), where S(t) is the product limit 

estimate using left truncated and right censored data [Tsai, Jewell and Wang 
(1987), Lai and Ying (1991)]. Since 

Al(t,u)= f E{V(u)\T = s)K T {ds) 

E(V(u)I(T = s))P(C>s>W)/P , f l E(N v (ds,u)) 

ds — 



S(s)P{C>s>W)/P J T0 G(s) ' 

where the expectation is taken conditioning on T > W, A^ Q (t,u) can be 
estimated by 

"* N v (ds,u) 



(3.1) KY(t,u) 



TO 



R(s) 
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The backward mean function /z TOjT1 (u) can then be estimated by 



1 /" r l 



Ar ,r 1 (u) = 7 — j— — / S(S)A T0 ((IS,U) 

S{T ) - S{n) Jt 

(3.2) 



1 1 y. ^(X t )A t ^(n)/(r <X,<T 1 ) 

n5(To)-5(n)^ R(Xi) 

More generally, we can estimate fJ-t!,t 2 ( u ) = E(V(u)\ti < T < £2) f° r u < 
ii < £2 < r i and it G [0, To], which can be estimated by 

. , , 1 1 A^(X i )A i y i (u)I(t 1 <X i <t 2 ) 



The mean of V(ii) can be estimated as long as T > u. However, if we 
estimate E(V(u)\u < T < n), the subpopulation defined by conditioning 
changes with the time index u, and the estimand loses a desirable interpre- 
tation of being a mean process for a fixed underlying population. Although 
the introduction of the constant To in the conditioning may not use infor- 
mation from part of the data, it defines a meaningful subpopulation such 
that the whole backward function can be studied for the same underlying 
population. 

The following theorem states the large sample properties of the proposed 
estimator. 

Theorem 3.1. Assume that E(V 2 (tq)) < 00 and certain technical re- 
strictions on the support of (T,C,W) hold [Wang (1991)]. For To <t\ < 
h<T\, £ti,t 2 ( u ) -> fHiteiu) uniformly a.s. on [0,t ]. Also, n 1/2 (fi tltt2 (u) - 
fHi,t 2 ( u )) = n~ 1 / 2 X^ILi £i( u ) + °p(1)> where £i(u) is defined in the Appendix 
and the random sequence converges weakly to a Gaussian process with co- 
variance function 

C tl ,t 2 (u,v) = tQU , 1 QU - 2 f 2 ^E(V(u)V(v)\T = s)A T {ds) 

S{s)H tlM {s,v) y 

A (ds,u) 



(s(h)-s(t 2 )yj tl G( S ) 

(3.3) 









1 








1 



tl 



G(s) 



t2 S{s)Ht ^ {s ^A v (ds,v) 



G(s 

(S(h)-S(t 2 ))*J tl 2 ~~G(s 



r Ht ^ a ^^ M A T (ds), 
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where 



H tuta ( 8i u) = E(V(u)I(s <T< t 2 ))S(h) + E{V(u)I{t! <T< s))S(t 2 ). 



From (3.3), Ct 1: t 2 (u,v) can De consistently estimated by 



~ , , if Ailih < Xi < t 2 ) 
(3.4) 



Ht-i ,t 2 (Xi , v 



S(Xi)Vi(u) 



H tlt t 2 (Xj,u) 
(S(h)-S(t 2 )) 



x 



S(Xi)Vi(v) 



(5(ti)-5(t 2 ))J 



where 



1 ™ 

H tl:t2 (s,u) = -J2(^jV j (u)S(X j )[I(t 1 < s < X, < fcJSfa) 

+ /(tl <Xj<S< t2)S(t2)])/R(Xj). 



3=1 



3.4. Construction of confidence bands. From the large sample results, we 
can construct pointwise confidence intervals in the form fif lt t 2 (u) ± 
n~ 1 ' 2 zat lj t 2 (, u ): where z is the standard normal critical value and ^ti,ts( u ) = 
(u,u). Since we are estimating mean functions of processes, it is also 
of interest to construct confidence bands for a given level of significance. We 
will replace z in a pointwise confidence interval by a larger value b to reach 
an appropriate simultaneous coverage probability. Although y/n(fki,t2( u ) ~ 
*2 ( u )) converges to a Gaussian process, the limiting process may not have 
independent increment because V(u) is an arbitrary process. Thus, it may 
not be possible to compute the exact asymptotic distribution. To construct 
confidence bands, we approximate the limiting process by a multiplier boot- 
strap method described as follows: 

1. Generate random multipliers {Gi, i = 1, . . . , n} which are independent 
standard normal distributed and independent of the data. Then, com- 
pute 



\R(Xi)(S(h)-S(t 2 )) 



1=1 



S(Xi)Vi(u)- 



H tl ,t 2 (Xi,u) 



2. Repeat step 1 until m versions of W(u) are obtained, denoted by {Wk(u),k = 
l,...,m}. 

3. Obtain b which is the (100 — a)-percentile of max( 0>ro ){| Wfc(u)|}. 

4. The confidence band for fi tl! t 2 (u),u £ [0, To], can be calculated by /if li f 2 («)± 
n- l / 2 ba tl ,t 2 {u). 
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The above method uses the simulated samples W(u) to approximate the 
distribution of £(u), the influence function of p,t lt t 2 (u). The method is mo- 
tivated by the construction of confidence bands for survival function in a 
proportional hazards model proposed by Lin, Fleming and Wei (1994). By 
the permanence of the Donsker property [van der Vaart and Wellner (1996)], 
W(u) can be shown to converge to a Gaussian process. Also, conditional 
on observed data, E(W(u)W(v )) equals the right-hand side of (3.4) since 
E(GiGj) = for i ^ j and E(Gf) = 1. Hence, E(W(u)W(v)) converges al- 
most surely to Ct lt t 2 {u,v) an d W(u) has the same asymptotic distribution 

as v^*(Ati,ta(«) -Ati,ta( u ))- 

In the medical cost example, ^ti,t 2 {u) is nonnegative and it is more mean- 
ingful to construct confidence intervals or confidence bands that are always 
nonnegative. For this purpose, we consider the log-transformation and the 
confidence bands have the form p, tlt t 2 (u) exp[±n~ 1 ^ 2 b*at lt t 2 { u ) / f t ti,t 2 ( u )]- b* 
can be found by the above algorithm with a slight modification in step 
3, where b* is the (100 — a)-percentile of max^ ^[\Wk(u)\/at 1: t 2 {u)]. The 
reason is that by the functional delta method, y / n/it 1) t 2 (u)(ln/x tljt2 (u) — 
hiii tl)t2 {u))/(7 tlM {u) is asymptotically equivalent to {^t 1 ,t 2 ( u ) l a ti,t 2 i u )) x 
(l/fiUfo ( u )) x £,( u ) = £( n )/°"ti,t2( n ); whose distribution is approximated by 
W(u)/a tltt2 (u). 

4. Extensions of estimation. 



4.1. Distribution and percentile estimation. In a lot of applications, in- 
cluding medical cost, the distribution of V(u) is not symmetric and is often 
highly skewed. In these cases, apart from estimating the mean function, one 
might also be interested in estimating percentile and distribution functions 
of V(u). Here we extend the marked process approach of mean estimation 
to estimate a joint distribution function, p TOtTl (m,t,u) = P(V(u) < m,T < 
t\To < T < n) where tq <t < t\. Let 



( I(V i (u)<m,X i <t,A i = l), ift>r , 
1 0, if t < r 



= I(Vi(u) < 771, T <Xi< t, Ai = 1) 

and A^ (t,u) = f* o P(V(u) < m\T = s)Ar(ds). Following the arguments in 
Section 3, p TQ ^ Tl (m,t,u) can be estimated by 

1 _l A SjXjAiKViiu) < m,r < Xj < t) 



p Tn ~ (m,t,u) = — „ 

' «%o)-%)^ R ( X *) 

This joint distribution estimate can be used for correlation analysis between 
V(u) and T. Similar to the estimator of mean function, p T0 ^ T1 (m,t,u) is a 
consistent estimate of p TQ)Tl (m, t,u). 
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Next, we consider estimating a pointwise gth-percentile function, m TOtTl (u), 
which is defined by 

P(V(u)<m q T0>Tl (u)\T <T<T 1 ) = q 

for u E [0,tq] and < q < 1. To estimate m% 0)T1 {u), consider the estimating 
function 

(p„(m,u) = — - ^-^ 

(41) „ . 

5pQ)A i J(T < X, < n)(/(y t (^) < m) - g) 

It can be seen that (p q (mo,u) converges in probability to for too = m% 0>T1 (u). 
Thus, a natural estimator of m% )T1 (u) is the zero-crossing of tp q {m,v). The 
existence of a solution is guaranteed because ip q (m,u) is increasing in to, 
and lmim^-oo <p q (m, u) < and lim m ^ 00 (p q (m,u) > 0, for < q < 1. The 
estimation of m? DiT - 1 (u) can be easily implemented in common statistical 
softwares by noting from (4.1) that this quantity can be estimated by a 
weighted empirical percentile of V(u) with weights equals to S'(Xj)Aj/(ro < 

Xi< n )/R(Xi). 

4.2. Estimation of backward rate function. When the mean rate of change 
of stochastic processes before failure events is of scientific interest, one might 
want to estimate an associated quantity r(u) = E( dV J- v ' ). In the medical cost 
example, r(u) is the mean rate of cost accrual per unit time at u time units 
before a failure event. r(u) is a measure of instantaneous change in the 
backward stochastic process. We call r(u) the backward rate function. 

Like the estimation of backward mean functions, we can only estimate 
nonparametrically a conditional version r TOjTl (u) = E( dV d ^ |tq <T< t\). 
Similar to fj, TQjTl (u), we have the following relationship: 

r TOiTl (u)(S(n) - S(t )) = E (^-I(t <T<n] 

(4.2) 

T1 S(s)e( ^MT = s)A T (ds). 



J T 



du 



To estimate r TOjTl (-u), it suffices to estimate E( dV J^' \T = s) at the jump 
points of At, which are the uncensored survival times. For each uncensored 
individual, E(^\Ti) can be estimated by 
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where k(-) is a kernel function satisfying J Q r ° k(s) ds = 1 and h > is a band- 
width parameter, which can be chosen by minimizing an integrated mean 
square error. Vi{u) is similar in nature to the estimator of the subject specific 
rate of recurrent event proposed by Wang and Chiang (2002). Substituting 
unknown quantities in (4.2) by their estimates, r TQ T1 (u) can be estimated 
by 

The estimator (4.3) can also be derived as the convolution smoothing esti- 
mator of jj> T Q,Ti- It can be shown that 



5. Numerical studies. 



5.1. Simulations. Finite sample performance of the proposed estimator 
in Section 3 and the empirical coverage of pointwise confidence intervals and 
overall confidence bands are evaluated by simulations. Data are generated 
2000 times in each simulation, and each simulated data set consists of 100 or 
400 observations. The confidence bands are constructed by simulating 1000 
sets of random multipliers in each simulated data set. 

The simulation follows data structure similar to the SEER-Medicare 
linked data. We generated survival time T from a gamma distribution with 
shape 3 and rate 1, truncation time W has half chance to be and half 
chance to be generated from a uniform-(0, 20) distribution, and censoring 
time C = W + C where C is generated from a uniform-(0, 8) distribution. 
The subset with truncation time W = represents an incident cohort and 
W > a prevalent cohort with untruncated observations satisfying T >W. 
Conditioning on T, we generated two independent latent variables Z\ and 
Z2 from a gamma distribution with shape 3 and rate T. The latent vari- 
ables are used to induce correlation between survival time and stochastic 
processes. For each subject, we generate a recurrent event process P(-) 
from a Poisson process with rate AZ±, and at each occurrence of recur- 
rent events at u time units before failure event, a variable Q(u) is generated 
from a gamma distribution with shape Z2 x [3 + 3 x I(u < 1/3)] and rate 
1. The process of interest is V(u) = f T _ u Q(s) dP(s). The generated data 
has the same structure as medical cost data, where P represents counting 
process for recurrent hospitalizations, Q represents medical cost incurred at 
a particular hospitalization and V{u) is the total medical cost in the last 
u time units of life. The recurrent event process, medical cost process and 
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failure time are correlated through latent variables. That is, medical cost 
processes are terminated by informative failure events. Our simulations gen- 
erated negative correlation between end-of-life cost and failure time, which 
also matches with the SEER-Medicare linked data (see Section 5.2). Un- 
der this setting, we are interested in estimating E(V(u)\l <T< 20) for 
u£ [0,1]. 

We compare the proposed estimator with naive complete-case estima- 
tors that have been used in the medical literature. Supposing one uses an 
unweighted sample mean based on observed deaths for the analysis, the 
direction of bias for this naive analysis will depend on whether longer sur- 
vivors or shorter survivors are being oversampled. In an incident cohort, 
naive analysis will oversample shorter survivors in general because the naive 
data set is right truncated by discarding the right censored observations. 
So the estimated mean end-of-life cost will be biased upward in the simu- 
lation. In a prevalent cohort, naive sample is subject to double truncation, 
but the effect from left truncation is more serious in the simulation and we 
oversample longer survivors in general, so the estimated mean end-of-life 
cost will be biased downward. The simulation results are shown in Table 1 
and match with this reasoning. The proposed method can correct the bias 
caused by left truncation and right censoring. The unweighted complete case 
estimator has been used, for example, in Chan et al. (1995), for studying 
the frequency of opportunistic infections for HIV infected individuals before 
death. 

The small sample bias of the proposed estimator and evaluation of the 
variance estimator is also shown in Table 1. We can see that the proposed 
estimator worked well in practical sample sizes. We also studied the empirical 
coverage of the 95% confidence bands. Let t* = mm{u:Vi(u) > for some 
i}. Since ^t 1 ,t 2 { u ) = for -u < i*, it is only meaningful to consider coverage 
probabilities for u>t*. We considered the coverage of confidence band for 
u E [£*,!•]■ The empirical coverage of the 95% confidence bands are 94% for 
both n = 100 and n = 400. The empirical coverage of the confidence bands 
are close to the nominal value for practical sample sizes. 

5.2. Data analysis. The proposed methods are illustrated by analyzing 
the SEER-Medicare linked data. We investigated end-of-life-cost for ovarian 
cancer cases diagnosed at age 65 or older among Medicare enrolles. Total 
amount charged during hospitalization is considered as medical cost in the 
analysis; this includes charges not covered by Medicare. All medical expen- 
ditures are adjusted to January 2000 value by the medical care component of 
the consumer price index, available from the website of the U.S. Department 
of Labor (http://www.bls.gov/cpi/). We compare medical cost among in- 
dividuals with different historic stages at diagnosis. There were 3766, 1400 
and 15,104 subjects classified as localized, regional and distant stages at 
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Table 1 

Summary of the simulation study: Comparisons among the proposed estimator and naive 
estimators based on unweighted complete case analysis from incident and prevalent 

cohorts, and evaluation of variance estimates and pointwise coverage probabilities of 95% 
confidence intervals using the proposed methodologies. SSE represents the sampling 
standard deviation and SEE is the sample average of the standard error estimates 

Naive estimators Proposed estimators 
Sample 



size u Truth Incident Prevalent Estimate SSE SEE Coverage 



0.1 


4.32 


5.34 


2.27 


4.19 


1.24 


1.35 


0.92 


0.2 


8.64 


10.71 


4.53 


8.41 


2.13 


2.25 


0.92 


0.3 


12.96 


16.08 


6.79 


12.62 


3.00 


3.13 


0.93 


0.4 


15.84 


19.61 


8.38 


15.4 


3.54 


3.66 


0.93 


0.5 


18.00 


22.28 


9.53 


17.5 


3.94 


4.01 


0.93 


0.6 


20.16 


24.92 


10.64 


19.57 


4.35 


4.40 


0.93 


0.7 


22.32 


27.52 


11.8 


21.62 


4.73 


4.78 


0.93 


0.8 


24.48 


30.08 


12.92 


23.64 


5.11 


5.12 


0.93 


0.9 


26.64 


32.61 


14.01 


25.63 


5.46 


5.45 


0.93 


1.0 


28.80 


35.08 


15.11 


27.58 


5.79 


5.77 


0.93 


0.1 


4.32 


5.39 


2.22 


4.29 


0.67 


0.69 


0.94 


0.2 


8.64 


10.78 


4.38 


8.58 


1.13 


1.14 


0.95 


0.3 


12.96 


16.17 


6.60 


12.86 


1.58 


1.60 


0.95 


0.4 


15.84 


19.75 


8.07 


15.72 


1.87 


1.88 


0.95 


0.5 


18.00 


22.44 


9.19 


17.86 


2.08 


2.07 


0.95 


0.6 


20.16 


25.11 


10.29 


19.98 


2.29 


2.26 


0.95 


0.7 


22.32 


27.76 


11.41 


22.09 


2.49 


2.44 


0.95 


0.8 


24.48 


30.35 


12.52 


24.17 


2.68 


2.62 


0.95 


0.9 


26.64 


32.89 


13.61 


26.20 


2.86 


2.77 


0.95 


1.0 


28.80 


35.36 


14.69 


28.18 


3.03 


2.93 


0.95 



diagnosis respectively. The estimates of the survival probabilities are shown 
in Figure 2. 

First, we compare the estimated mean forward cost functions among the 
three historic stages. A mean forward cost process is estimated by 



if S(s)I(W i <s<C l )dY l (s) 



fly it) 



which can be viewed as a limiting case of the estimator of Lin et al. (1997) 
with the partition size tending to zero. If left truncation is absent, meth- 
ods proposed by Bang and Tsiatis (2000), Strawderman (2000) and Zhao 
and Tian (2001) can also be extended to estimate forward mean functions. 
Figure 3 shows the estimates for mean forward cost functions up to the 
thirtieth year after initial diagnosis of cancer. Note that there is a crossover 
for three curves around ten years after diagnosis. The ten year estimated 
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5 10 15 20 25 30 

Time after disease onset (Years) 

Fig. 3. Estimates of the mean forward cost functions for ovarian cancer patients. Solid 
curve represents localized stage at diagnosis, dashed curve represents regional stage and 
dotted curve represents distant stage. 

survival probabilities are 0.47, 0.25, 0.07 (s.e.: 0.03, 0.06, 0.05) for patients 
diagnosed with local, regional and distinct stages respectively. In the first 
ten years after diagnosis, cumulative cost reflects the severity of the cancer 
stage at diagnosis. Beyond the tenth year, the cumulative cost reflects the 
better chance of survival for the less severe stages of cancer. The conflicting 
nature between accumulation of cost and survival complicates the analysis 
and careful interpretation of the results are needed. Also, the forward cost 
functions cannot directly answer questions about end-of-life-cost, because 
individuals have different survival times and the increase in medical cost be- 
fore failure events at a given time after disease onset is offset by nonsurvivors 
who do not contribute to any increase in medical cost after death. 

In the SEER-Medicare data analysis we observe a negative correlation 
between end-of-life-cost and survival. Using the estimator of joint distribu- 
tion in Section 4.1, the estimated Pearson correlation coefficient between 
V(l) and T (conditioned on T > r = 1) is -0.65, -0.31 and -0.46 for lo- 
calized, regional and distant stages respectively. We compare the estimated 
one-year mean backward cost functions among the three historic stages, for 
individuals surviving at least one year after onset of disease. The results 
are shown in Figure 4. Unlike mean forward cost functions, estimated back- 
ward cost functions are very similar in shape for the three historic stage 
groups. The results show that there is a terminal increase in medical cost 
before death. The estimated final-year medical cost of a patient is $31,802, 
$31,752, $38,377 (s.e.: $1229, $2205, $896) in January 2000 value for patients 
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Time before death (Months) Time before death (Months) Time before death (Months) 



Fig. 4. Estimates of the mean backward cost functions for ovarian cancer patients. 
Solid curves represent the estimates. Dotted curves represent 95% simultaneous confidence 
bands. Dashed curves represent pointwise 95% confidence intervals. 

diagnosed with local, regional and distinct stages respectively. The estimated 
medical cost for the last three months of life of an ovarian cancer patient 
is $16,365, $16,284, $18,848 (s.e.: $692, $1236, $613) in January 2000 value 
for patients diagnosed with local, regional and distinct stages respectively. 
Figure 5 shows the backward rate of cost accrual, which is the end-of-life 
cost per unit time before death. The bandwidths for the estimates were cho- 
sen to minimize an integrated mean squared error. The results agree with 
Figure 4 that there is a terminal increase in medical cost before death. 

6. Concluding remarks. In this paper we proposed statistical methods 
for studying the terminal behavior of stochastic processes before failure 
events. In particular, we discussed nonparametric methods for estimating the 
mean function of backward stochastic processes under incident and prevalent 
sampling designs. We also discussed identifiability issues related to estima- 
tion with incomplete follow-up data. In an incident sampling design, the 
right tail of survival distribution may not be identified because of limited 
study duration. Using prevalent sampling design, the identifiable region for 
the survival distribution could be enlarged and cost associates with those 
individuals can be identified. 
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Fig. 5. Estimates of the backward rate of cost accrual. Solid curve represents localized 
stage at diagnosis, dashed curve represents regional stage and dotted curve represents dis- 
tant stage. 

We used the SEER-Medicare data as an example throughout this paper. 
Although the SEER-Medicare data contain both incident and prevalent co- 
horts, our method can be applied to data only from an incident cohort or a 
prevalent cohort. The proposed methods only require the stochastic process 
data to be available in a certain time interval before a failure event. Thus, 
prevalent data can be used alone for the proposed methods even though we 
do not have data on the stochastic process before patient enrollment. 

The backward estimation procedure proposed in this paper could serve as 
a main building block for other analyses. For example, we can compute the 
ratio of end-of-life cost to lifetime cost that combine the proposed method 
and the existing methods for analyzing lifetime cost. 

Although we use medical cost as an example, applications of the proposed 
methods do not only limit one to study medical cost, but can also be used 
to study the terminal behavior of other stochastic processes before failure 
events. Other applications include CD4 counts before AIDS diagnosis, fre- 
quency of hospitalizations before death and measurements of quality-of-life 
before death. 

The main focus of this paper is one sample estimation of the backward 
mean function. The authors are extending the idea in this paper to regression 
models of backward mean functions and backward rate functions. 

APPENDIX: PROOF OF THEOREM 3.1 

We apply empirical process theory to prove the asymptotic results. Since 
N v (t,u) is having bounded variations and E(N V (t,u)) < oo for (t,u) G 
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[to,ti] x [0,To], we can apply the uniform strong law of large numbers [Pol- 
lard (1990)] to show that N v (t, u) converges a.s. uniformly to E(N V (s,u)) = 
f° o E(V{u)I(T = s))P(C >s> W)//3ds. Also, R(s) converges a.s. uniformly 
to G{s) [Woodroofe (1985)]. By Lemma 1 of Lin et al. (2000), 



uniformly on [to,ti] x [0, tq]. Also, since S(t) and A^ (t,u) are uniform con- 
sistent estimates of S(t) and A}^(t,u), uniform consistency of /zj li j 2 (n) also 
follows from Lemma 1 in Lin et al. (2000). 

Defining M^t) = N^t) - Jj R i (s)A T {ds) for t > 0, M^(t,u) = NV(t,u) - 
Ri(s)A^ Q (ds,u) for t > tq and u G [0, ro], we have 

" l N(ds,u) [* E(N(ds,u)) 



R(S) Jr G(S) 

1 ^E{N(ds, u) - E{N(ds, «))) 



G(s) 
V^{R(s) - G(s)) 



E(N(ds,u)) +o p (l) 



G 2 (s) 

^ y/nN(d8,u) /•* y/nR(s) v 

1/2 £ / 

-•1 ■/Tn 



'to ^V"y J To 



^J^) +0p (l 
-X ■'713 "A 5 / 



and 



Vn(tk 1: t 2 (u) -fHi,ts(u)) 

1 f* 2 S(s)N v {ds,u) 



s(ti) - s(t 2 ) K R{s) 

1 f t2 S(s)E(N v (ds,u)) 



S(h) - S(t 2 ) J tl G(s) 
V^[(S(h) - S(h)) - (S(t 2 ) - S(t 2 ))] au . . v 



S(t)A v T (ds,u) 



+- c77T^ c77 S / 2 5( S )v^(Ajf (d S ,u)-A^(d S ,u)) 
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n 

n ~ V2 + + &(«)) + Op(l), 

t=i 



where 

6i(«) 



£(V(u)|*i <T<t 2 ) 



2 J 



G(s) y "J G(s 



1 f t2 S{s)M^{ds,u) 



S(ti)-5(t 2 )7 tl G(a 



^ u) =-wrmL s{s) Jo w A {ds ' u) - 

Upon algebraic manipulation, £i(u) = £,u(u) + £,2i(u) + £,3i(u) reduces to 



t2 S{s)M^(ds,u) 



S(h)-S(t 2 )J tl G(s) 

1 f t2 H tut2 {s,u)M l (ds) 



&(«) 



- 5(ta)) 2 J tl G( S ) 

Since £j(u) can be written as sums and products of monotone functions of 
u, therefore, {£,i(u)} forms a manageable sequence [Pollard (1990), Bilias, Gu 
and Ying (1997)]. The weak convergence of v^G^ti^C") ~~ ^tx,t 2 i u )) follows 
from the functional central limit theorem [Pollard (1990)]. 
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